A Predictive Rule for COVID-19 Pneumonia Among COVID-19 Patients: A Classification and Regression Tree (CART) Analysis Model

Background: In this study, we aimed to identify predictive factors for coronavirus disease 2019 (COVID-19) patients with complicated pneumonia and determine which COVID-19 patients should undergo computed tomography (CT) using classification and regression tree (CART) analysis. Methods: This retrospective cross-sectional survey was conducted at a university hospital. We recruited patients diagnosed with COVID-19 between January 1 and December 31, 2020. We extracted clinical information (e.g., vital signs, symptoms, laboratory results, and CT findings) from patient records. Factors potentially predicting COVID-19 pneumonia were analyzed using Student’s t-test, the chi-square test, and a CART analysis model. Results: Among 221 patients (119 men (53.8%); mean age, 54.59±18.61 years), 160 (72.4%) had pneumonia. The CART analysis revealed that patients were at high risk of pneumonia if they had C-reactive protein (CRP) levels of >1.60 mg/dL (incidence of pneumonia: 95.7%); CRP levels of ≤1.60 mg/dL + age >35.5 years + lactate dehydrogenase (LDH)>225.5 IU/L (incidence of pneumonia: 95.5%); and CRP levels of ≤1.60 mg/dL + age >35.5 years + LDH≤225.5 IU/L + hemoglobin ≤14.65 g/dL (incidence of pneumonia: 69.6%). The area of the curve of the receiver operating characteristic of the model was 0.860 (95% CI: 0.804-0.915), indicating sufficient explanatory power. Conclusions: The present results are useful for deciding whether to perform CT in COVID-19 patients. High-risk patients such as those mentioned above should undergo CT.


Introduction
In December 2019, a series of pneumonia cases of unknown causes were reported, involving clinical presentations that greatly resembled viral pneumonia [1].The coronavirus disease 2019 (COVID- 19) pandemic has been a great threat to human life.The outbreak of coronavirus was initially reported to the World Health Organization on December 31, 2019 [2].More than 45 million people have been infected with COVID-19, with 1.2 million deaths reported [2].The most common method of diagnosing COVID-19 is using molecular genetic assays for the detection of viral RNA from a clinical sample using reverse transcriptionpolymerase chain reaction (RT-PCR) [3].Many case reports on COVID-19 pneumonia have been published [4][5][6].One study described chest computed tomography (CT) findings in 51 COVID-positive patients, with 77%, 75%, and 59% of cases showing pure ground-glass opacities (GGOs), GGOs with interstitial and/or interlobular septal thickening, and GGOs with consolidation, respectively [4].Furthermore, the latest literature studies have led to the study of predicting the severity of hematologic and clinical findings from CT findings, which is an interesting topic [5,6].Furthermore, COVID-19 symptoms that do not manifest in past viral infections (e.g., anosmia and ageusia) have also been reported [7,8].The wide range of reported features is thought to reflect the effects of COVID-19 on non-respiratory systems and indicates that signs of the disease may be observed in infected patients without apparent respiratory symptoms [9].Therefore, we must continuously manage COVID-19.
One study suggested that due to the COVID-19 epidemic/pandemic, focusing on the exclusion of its infection using CT scans leads to an overall delay in the diagnosis and treatment of bacteremia [10].When COVID-19 pneumonia can be distinguished in conclusive cases from suspicious cases, unnecessary examinations (e.g., CT scans) can be avoided.
Unfortunately, predicting COVID-19 pneumonia is difficult compared to other forms of pneumonia.
Although fever and decreased oxygenation are typical symptoms of pneumonia [11], as described above, asymptomatic cases may develop critical pneumonia [2].Furthermore, the presence or absence of pneumonia is clinically important, even if asymptomatic, in affecting future follow-up policies, as pneumonia is assumed to aggravate this risk.However, the predictive factors of COVID-19 pneumonia, including clinical parameters with symptoms, vital signs, and laboratory data, have not been compared directly between cases with and without complicated pneumonia using decision tree analysis.Moreover, the presence of pneumonia on CT is also an important determinant in the decision to treat, and being able to predict whether pneumonia is present or not would be more meaningful in a clinical setting.
In this retrospective cross-sectional survey, we aimed to compare clinical parameters in patients with and without complicated pneumonia to identify predictive factors for COVID-

Study design and population
In this retrospective cross-sectional survey performed at Juntendo University Nerima Hospital, a 490-bed university-affiliated hospital in Tokyo, Japan, we recruited patients diagnosed with COVID- We extracted clinical information through a chart review.We collected data on age, sex, history of malignant diseases, asthma, heart disease (including hypertension), diabetes mellitus, hemodyscrasia, human immunodeficiency virus infection, use of immunosuppressive agents (including steroids), and general symptoms (chills, malaise, joint pain, headache, nausea, diarrhea, smell disturbances, taste disturbances, sore throat, cough, and difficulty in breathing).We also extracted data on axillary body temperature, blood pressure, pulse rate, respiratory rate, oxygen saturation (room air), white blood cell count, percentage of neutrophils and lymphocytes, hemoglobin level, platelet count, red blood cell distribution width, serum parameters (total protein, albumin, lactate dehydrogenase (LDH), blood urea nitrogen, creatinine, sodium, potassium, chloride, glucose, aspartate aminotransferase (AST), alanine aminotransferase (ALT), total bilirubin, glucose, hemoglobin A1c, and C-reactive protein (CRP) levels), and CT findings of pneumonia.

Statistical analysis
We compared bi-variates in COVID-19 patients with and without complicated pneumonia using independent sample t-tests and chi-square tests for continuous and categorical data, respectively.To create a prediction model for COVID-19 patients with complicated pneumonia, we used the classification and regression tree (CART) methodology to identify patients at different levels of risk.The CART model is well suited to the generation of clinical decision rules and has been used to develop prediction models in various fields, including in medical settings [12].The CART method employs a non-parametric statistical technique that makes no distribution assumptions of any kind, for either dependent or independent variables [13].A decision tree is created by stratifying the initial dataset, which contains all potential predictors, into subsets based on the "impurity" of the model.The branch nodes in the decision are based on the "impurity" of the model [14], which is represented by the Gini diversity index (GI): where ΔGI(t) represents the variation of GI, GI(t) represents the Gini diversity index at node t, Pt represents the ratio of node t before partition, PL represents the ratio of the left node after partition, and PR represents the ratio of the right node after partition.The risk factor that maximizes impurity is selected as a branch point.This process is repeated for each derived subset until the impurity value of the subset is not improved by additional splitting [12].
In this study, the CART algorithm analysis included 48 potential variables.Nodes in the CART decision tree were constrained to a minimum size of 40 subjects to consider additional stratification, and each resulting subgroup required at least 20 subjects.A receiver operating characteristic (ROC) curve was drawn from the results of the CART analysis, and the area under the curve (AUC) was calculated to evaluate the accuracy of the tree.After arriving at the final decision tree, we performed cross-validation to derive the standard error of the branches of the CART model.All analyses were conducted using IBM SPSS Statistics for Windows, Version 27 (Released 2020; IBM Corp., Armonk, New York, United States)., except for the calculation of the 95% confidence intervals (CIs), which was based on an exact binominal using Stata version 16.1 (STATA Corp., College Station, USA) [15].
This study was conducted in accordance with the relevant guidelines and regulations and approved by the institutional ethics committee of Juntendo University, Tokyo, Japan (approval number: E21-0065-N05).This was an observational study, and written informed consent was waived in light of the public health outbreak investigation by the ethics committee.

Discussion
We directly compared COVID-19 patients with and without pneumonia using CART analysis to determine which patients should undergo CT.The four identified predictors were CRP, age, LDH, and Hb levels.Highrisk patients should undergo CT.The area under the ROC curve demonstrated acceptable accuracy.
CRP is a pentameric protein synthesized by the liver whose levels increase with inflammation.It is an acutephase protein that responds to inflammatory cytokines associated with monocytes or macrophages activated after infection.CRP is primarily induced by the action of interleukin-6, which is responsible for CRP gene transcription.In some cases, it activates the complement system, forming inflammatory cytokines, thereby further aggravating tissue damage [16].The CRP level correlates with CT findings and can predict severe COVID-19 because of its significant increase at the initial stage of the disease [17].The CRP level is also positively correlated with the diameter of lung lesions and severe presentation [18].We believe that these findings are consistent with our results.Furthermore, our study included cut-off values (e.g., CRP level ≤1.60 mg/dL), which are useful in determining whether CT should be performed.Furthermore, obtaining daily CRP values for hospitalized COVID-19 patients can provide early thresholds and facilitate risk stratification and prognostication [19].Therefore, it is important to monitor changes in CRP levels in COVID-19 patients.
LDH is an intracellular enzyme found in the cells of almost all organ systems that catalyzes the interconversion of pyruvate and lactate, with concomitant inter-conversion of NADH and NAD+ [20].The enzyme is composed of subunits A and B and is present in humans as five separate isozymes (LDH-1, LDH-2, LDH-3, LDH-4, and LDH-5 in the cardiomyocytes, reticuloendothelial system, pneumocytes, kidneys and pancreas, and liver and striated muscles, respectively).Although LDH has been traditionally used since the 1960s as a marker of cardiac damage, abnormal values can result from multiple organ injuries and decreased oxygenation, along with upregulation of the glycolytic pathway.The acidic extra-cellular pH resulting from increased lactate from infection and tissue injury triggers the activation of metalloproteases and enhances macrophage-mediated angiogenesis [21].Furthermore, severe infections may cause cytokine-mediated tissue damage and LDH release.The LDH type present in lung tissues is LDH-3.Patients with severe COVID-19 infections can release greater amounts of LDH in the circulation, often manifesting as acute respiratory distress syndrome, which is the hallmark of the disease [22].Moreover, compared to non-COVID-19 pneumonia patients, COVID-19 pneumonia patients have higher AST, ALT, LDH, γ-glutamyl transpeptidase (γ-GT), and α-hydroxybutyric dehydrogenase levels [23].Although AST, ALT, and γ-GT levels were not predictors in this CART analysis, we identified an LDH level of >225.5 IU/L as a predictive factor and one of the high-risk factors for COVID-19 pneumonia (high risk 2).Furthermore, in other studies, elevated LDH levels were associated with severe disease and mortality in COVID-19 patients [22].Elevated LDH levels seem to indicate that multiple organ injury and failure may play a more prominent pathological role in influencing clinical outcomes in COVID-19 patients [22].Therefore, LDH levels should be closely monitored, as they affect the overall status and course of the disease.
LDH and CRP levels may be related to respiratory function (PaO 2 /FiO 2 ) and may predict respiratory failure in COVID-19 patients.LDH and CRP should be considered useful for identifying patients who require closer respiratory monitoring and more aggressive early supportive therapies to avoid poor prognoses [24].LDH and CRP are important in predicting not only the presence but also the severity of pneumonia.
The mechanisms of Hb reduction during sepsis vary and may include altered microcirculation, decreased red blood cell (RBC) production, pre-existing chronic anemia, hemodilution, and increased RBC destruction due to altered RBC membranes [24].In addition, the relationship between low initial Hb levels and mortality in conditions such as septic shock has been highlighted in other studies, and early treatment of patients with low initial Hb levels is thought to contribute to a reduction in mortality.However, the cut-off value of the Hb level in this study was within the normal range (14.65 g/dL).This topic should be further discussed in future studies.Similarly, the results for age in this study will be controversial in future.Other studies have reported that the age at the onset of COVID-19 is lower than that of influenza [25].We believe that this reflects the strong infectivity of COVID-19, regardless of medical history and strength of immunity.This topic should also be discussed in future studies.
Similar studies on predictors of COVID-19 severity have also been performed in other countries [26], in which CRP, LDH, neutrophil-to-lymphocyte ratio, age, lymphocyte count, and malignancy were found to be associated with intubation using chi-square automatic interaction detection analysis [26,27].It is interesting that CRP and the LDH were also predictors of disease severity in our study.
This study only included COVID-19 cases before any vaccine became widely available.Although they are not the correct data all over Japan in 2021, the overall prevalence of vaccine hesitancy was 5.5% in working-age adults in Japan [28].As vaccines become even more ubiquitous, the efficacy of pneumonia as a predictor of COVID-19 will most likely change in future (however, it should be mentioned that the present study is not a current omicron stock).
Furthermore, COVID-19 was phenotypically milder in Japan than in other countries despite its application of relatively less restrictive preventive measures.Factors related to a possible reduced susceptibility to the pulmonary manifestations of SARS-CoV-2 may have contributed to better outcomes and lower mortality in Japan [29].In addition, the treatment during this research period was not yet established in the world, including Japan.However, there are now established treatments, including Remdesivir and others [30].Thus, it may be possible that the results may be at odds with the results as of 2023.The strength of our study is sensitizing all physicians to consider COVID-19 pneumonia, which remains an ongoing challenge.Early detection of pneumonia should improve management and decrease mortality in COVID-19 patients.There has been no reported analysis of factors predictive of COVID-19 pneumonia using CART analysis.We aimed to develop precise criteria for performing CT tests in COVID-19 patients.Our study also had some limitations.The study population was limited to a single hospital.Additionally, this was a retrospective study, and COVID-19 variants were not discussed.Therefore, we propose conducting a multicenter prospective study with a larger number of patients in the future.Additionally, we did not calculate the sample size because all COVID-19 patients who underwent blood tests and CT at our hospital between January 1, 2020, and December 31, 2020, were included.Prospective studies in the future must calculate the sample size beforehand.
Furthermore, due to the nature of CART, it is considered a limitation that the items and cut-off values are determined by purity, so it is impossible to evaluate the impact of each complication and how much it affects the patients.There is a mix of severely ill and mildly ill patients who will be admitted to the ICU in this study, and if similar studies were conducted only on severely ill patients and mildly ill patients, respectively, the results may differ.
This study did not use a questionnaire.Our model should also be validated in the future.

Conclusions
We aimed to compare clinical parameters in patients with and without complicated pneumonia to identify predictive factors for COVID-19 patients with complicated pneumonia and determine which COVID-19 patients should undergo CT.
We propose a predictive model based on CART analysis for pneumonia in patients with COVID-19,

FIGURE 2 :
FIGURE 2: The area under the receiver operating characteristic curve of the model was 0.860 (95% confidence interval, 0.804-0.915).
19 patients with complicated pneumonia and determine which COVID-19 patients should undergo CT.Although the Centers for Disease Control and Prevention (CDC) has published the most recent findings regarding the underlying causes of COVID-19 severity at any time, we emphasize that our study is different and is a new predictive model.

TABLE 1 : Patient variables and results of bivariate analysis
Table1shows the patient characteristics and the results of the bivariate analysis.